-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dev/518/formula #309
Merged
Merged
Dev/518/formula #309
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This version: - Adds null/total counts for every variable - Replaces variable code with label - Removes unnecessary lower/upper confidence intervals
…38/descr-stats-show-nulls
Algorithms now accept a new formula argument. This arg expects a description of a formula in Wilkinson notation in JSON format. From the description an actual string formula is constructed and passed to the patsy library from which the algorithm's design matrices are constructed. The rest of the execution remains unchanged. Currently only a subset of the full Wilkinson notation is implemented. Interactions are up to 3 terms and singleton terms have a limited number of transformations allowed. These are nop, log, exp, center, standardize, mul and div for numerical variables.
Currently logistic_regression tests with formula are failing. There is a subtle interaction between the formula coding of categorical variables and the the add_missing_levels method. The reason is that add_missing_levels is based on the known enumerations of categorical vars found the in the CDEs but the codings supported by the formula might add different enums to categorical vars.
Working version of descriptive_stats with an optional formula parameter. The formula applies only to model results as it make little sense in single variable results. Moreover, all floats in the result are now rounded to two decimals for pesentation, and the tests have been corected accordingly.
…nto dev/518/formula
ThanKarab
approved these changes
Dec 21, 2021
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
formula_description
property in algorithm specifications.formula_description
argument. This argument expects a description of a formula in Wilkinson notation in JSON format. From the description an actual string formula is constructed and passed to the patsy library from which the algorithm's design matrices are constructed.mipframework
.